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ABSTRACT 



We have developed a simulation code with the techniques which enhance both spatial and 
time resolution of the PM method for which the spatial resolution is restricted by the spacing 
of structured mesh. The adaptive mesh refinement (AMR) technique subdivides the cells which 
satisfy the refinement criterion recursively. The hierarchical meshes are maintained by the spe- 
cial data structure and are modified in accordance with the change of particle distribution. In 
general, as the resolution of the simulation increases, its time step must be shortened and more 
computational time is required to complete the simulation. Since the AMR enhances the spatial 
resolution locally, we reduce the time step locally also, instead of shortening it globally. For 
this purpose we used a technique of hierarchical time steps (HTS) which changes the time step, 
from particle to particle, depending on the size of the cell in which particles reside. Some test 
calculations show that our implementation of AMR and HTS is successful. We have performed 
cosmological simulation runs based on our code and found that many of halo objects have density 
profiles which are well fitted to the universal profile proposed by Navarro, Frenk, & White (1996) 
over the entire range of their radius. 

Subject headings: galaxies: formation — large-scale structure of universe — methods: n-body simula- 
tions — methods: numerical 

1. Introduction I0 5 . 



Simulating the formation process of galaxies in 
clusters and fields simultaneously is one of the 
promising strategies to study how statistical prop- 
erties of observed galaxies have originated. For 
this purpose, the box size in which the simulation 
is performed must be larger than a typical cluster 
separation, which is a few tens to a hundred Mpc. 
At the same time, in order to reproduce the intrin- 
sic dynamical structure of each simulated galaxy, 
the Newtonian gravity should be calculated on a 
scale well below the characteristic length of indi- 
vidual galaxies, which is a few kpes. Hence, for 
the simulation of formation of multiple galaxies, 
the ratio of box size to minimum resolved scale or 
the spatial dynamic range must be greater than 
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In addition, there is another dynamic range in 
./V-body systems often called the mass dynamic 
range, or simply, the number of particles if the 
mass of particles is the same. Strictly speak- 
ing, dark matter component evolves according to 
the collisionless Boltzmann equation. However, 
since computer resources are finite, the distribu- 
tion function of ideal collisionless component is 
replaced by a finite sum of localized distribution 
functions corresponding to the finite number of 
particles in the A-body simulation. This causes 
two-body relaxation which artificially affects the 
evolution of dark matter distribution. The effect 
of relaxation can be suppressed by introducing 
more particles. We crudely estimate how many 
particles are required to trace the evolution of 
galactic dark halos in a cosmological simulation by 
comparing the relaxation time scale of dark halos 
and their age. Our naive estimation gives N ~ 10 
at the minimum (see Appendix A). 



It is instructive to examine whether the particle- 
mesh (PM) method is capable of the required spa- 
tial dynamic range of 10 5 and the required mass 
dynamic range of 10 9 . The spatial dynamic range 
of the PM method is restricted by the mesh spac- 
ing. When the number of meshes is taken as many 
as the number of particles, the spatial dynamic 
range is about 10 3 for a simulation with 10 9 par- 
ticles, which is two orders smaller than required. 
However, a simulation with 10 15 meshes and par- 
ticles, whose spatial dynamic range is about 10 5 , 
is not only unrealistic today, but also inefficient. 
The adaptive mesh refinement (AMR) is one of 
the prescriptions to resolve this problem by in- 
troducing finer meshes hierarchically in regions 
where higher spatial resolution is required (Berger 
& Oliger 1984). 

First, Villumsen (1989) introduced cubic sub- 
grids hierarchically into the PM method to in- 
crease its spatial resolution. While his hierarchical 
PM code places adaptive meshes by hand, sub- 
grids are automatically located in the particle- 
multiple mesh code developed by Jessop, Duncan, 
& Chau (1994). Gelato, Chernoff, & Wasserman 
(1997) extended their methods to handle isolated 
TV-body systems. Their code is applicable to non- 
cosmological problems. On the other hand, adopt- 
ing the multigrid method as the Poisson solver, 
Suisalu & Saar (1995) developed a code which 
can treat rectangular adaptive meshes. Further- 
more, Kravtsov, Klypin, & Khokhlov (1997) de- 
veloped the adaptive refinement tree (ART) code 
which subdivides all cells which satisfy the user de- 
fined refinement criterion regardless of the shape 
of the boundary between coarse meshes and re- 
fined meshes. 

Calculating the force between particles is the 
most time consuming part in A-body methods. In 
order to overcome this problem, Aarseth (1963) 
changed the timing to calculate the force, from 
particle to particle, depending on the time scale 
of force variation. This technique, named indi- 
vidual time steps is implemented in the ART code 
(Kravtsov et al. 1998) with modification such that 
the time steps of particles arc determined by the 
local density, though not described in detail in 
their paper. We also adopted the modified individ- 
ual time steps, or hierarchical time steps (HTS) so 
that our code is adaptive not only in space but also 
in time, or our code is four dimensionally adaptive. 



Furthermore, our code is designed to incorporate 
the hydrodynamics code with AMR (Yahagi & 
Yoshii 2000), similar to the codes developed by 
Anninos, Norman, & Clarke (1994) and Norman 
& Bryan (1999). While the refined regions are re- 
stricted to be rectangular parallelepipeds in their 
codes, this limitation has been relaxed in ours. 

In §2 we review the basic equations for the 
cosmological A-body simulation and describe our 
code in detail putting emphasis on the parts re- 
lated to AMR and HTS. Details of tests and the 
results are discussed in §3. Finally, we summarize 
this work in §4. 

2. Description of the code 
2.1. Basic equations 

For the cosmological simulation, the comoving 
coordinate system is suitable to represent particle 
position: 



x = a r, 



(1) 



where a is the scale factor, and x and r are comov- 
ing and proper positions, respectively. However, in 
order to incorporate the hydrodynamics code, the 
proper coordinate system is suitable to represent 
the velocity: 



dr a 
u = — r. 

at a 



(2) 



Using these variables, the basic equations for N- 
body systems in the expanding universe are given 
by: 



dx 
~dt 

d(au) 
dt 



1 



-u, 



= g(x), 



g(x) = -V<j)(x), 
4ttG , 



(3) 

(4) 
(5) 



Ad>(x) = —( P (x)-p), (6) 



and 



a = iW(l - ^0 - A ) + a-^o + a 2 A -(7) 

Note that p denotes the comoving density here. 
How to discretize these equations depends on the 
problem to be solved and the strategy for it. Since 
the primary objective of our code is to perform 
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the cosmological simulation with a wide dynamic 
range in both space and mass, we must save mem- 
ory as much as possible. Thus using the variable 
time-step leapfrog integrator which does not re- 
quire additional memory, we discretize equations 
(3) and (4) as follows: 

Xi+i = Xi + — — Uj+i/2 Ai i; (8) 



u, 



+ 1/2 - 



Xi + - 

a i-l/2 



-U 



a i+l/2 



1-1/2 + 



Ati_i + A, 



a i+l/2 



Among a variety of methods to solve equations 
(5) and (6), we adopt the mesh based method, 
such as the PM method, which calculates the force 
between particles through the following steps: (i) 
Mass of particles is assigned to their adjacent grid 
nodes, (ii) Discretized Poisson equation is solved 
to give the potential on each node, (iii) Force on 
each node is calculated as the difference of the po- 
tential on nearby nodes, (iv) Force on particles 
is interpolated from those on the adjacent nodes. 
We discuss the mass assignment (i), force inter- 
polation (iv), and potential differentiation (iii) in 
§2.4, while the Poisson solver (ii) in §2.5. 

2.2. Hierarchical mesh 

The PM method uses a cubic mesh to calculate 
the force on particles. Therefore, the spatial res- 
olution of the PM method is limited by the spac- 
ing of this mesh. Like the ART code developed 
by Kravtsov et al. (1997), our code overcomes this 
limitation by subdividing the cells in regions where 
higher spatial resolution is required. This adap- 
tive mesh refinement corresponds to adding data 
sets for small cells onto the homogeneous and cu- 
bic base mesh hierarchically. In order to maintain 
and modify the hierarchical mesh structure, it is 
required to access the parent, child, and neigh- 
bor cells. The storage for pointers to the parent 
and neighbor cells can be saved by grouping the 
cells which have the same parent cell (Khokhlov 
1998). Since one refinement process divides a cell 
into eight half-sized cells, we group these eight 
cells together and call this group a cell octet (Fig. 
1). Each cell octet keeps a pointer to the parent 
cell and pointers to the six neighboring cell octets. 
In the iV-body code, in addition to these data, a 
pointer to the first particle in the cell octet, the 
number of particles in the cell octet, and the in- 
tegral position of the cell octet are also stored. 



The integral position is the position rounded off 
to integer. While cell octets residing in the same 
level are connected by the doubly linked lists in the 
ART code, our code stores the cell octets having 
the same level in an array consecutively. The level 
of a cell, L, is defined as L = log 2 (Zo/^i), where 
lo and II are the sizes of the simulation box and 
the cell, respectively. We use cells having integral 
level only. 

Moreover, the same level cell octets are sorted 
by the Morton ordering (Barnes & Hut 1989; War- 
ren & Salmon 1993). The key for the Morton or- 
dering, /cm, is given by 

k M = J2 (2 3i ' fc L + 2 3Li+1 k v Li + 2 3L * +2 fc£l§),) 

where Lo is the level of the smallest hierarchical 
mesh, and fc£ . , k\ . , and k z L . are the Lj-th bit of 
the integral position of the cell octet normalized 
by Il d (x,y, and z). Thus, 

Id 



i,=0 

y = J2 2L °~ Lik l> ( 12 ) 

(13) 



Li=0 
Ld 

I = J2 2 LD ~ Li k z Li . 

Li=0 



We do not sort particles as Warren and Salmon's 
hashed oct-tree code (Warren & Salmon 1993), be- 
cause it is incompatible with HTS, though sorting 
hierarchical cells is compatible with HTS. The hi- 
erarchical mesh is maintained using the cell octets 
as the building blocks (Fig. 2), which are added 
or removed from meshes freely. 

Basically each cell has only one pointer to the 
child octet, but the storage changes depending on 
what physics is incorporated in the code. In the 
code for the self gravitating system, density and 
potential is stored. No additional storage is re- 
quired for the ./V-body code, since the storage for 
a pointer to the first particle in the cell is shared 
by the pointer to the child cell octet. The hydro- 
dynamics code requires storage for fluid density, 
specific energy and fluid velocity in each cell. 

The condition for subdividing the cells, or the 
refinement criterion, plays an important role in 
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the AMR. In general context of the AMR, the re- 
finement criterion is defined using error estima- 
tors such as the residual of the partial differential 
equation to be solved. However, since we intro- 
duced the AMR to keep the mass dynamic range 
all the way from homogeneous to clustered config- 
urations, we refined all cells which contain iV r f n 
particles and more as the ART code. This crite- 
rion sets the upper limit to the mass in unrefined 
cells. We set N T ^ n = 4 throughout this paper un- 
less otherwise stated. Following this refinement 
criterion, the hierarchical meshes are constructed 
recursively on the cubic structured mesh whose 
base level is Lb, until they reach the dynamic 
range level, Lp. At the same time, the cells which 
do not satisfy the refinement criterion are removed 
from the hierarchical meshes. In addition to the 
refined cells which satisfy the refinement criterion, 
we also subdivide those cells, named buffer cells, 
which are adjacent to the refined cells (Fig. 3). 

For the illustrative purpose, we take an exam- 
ple from a slice of the cosmological simulation for 
an LCDM universe described in §3.4. The left 
panel of Figure 4 shows the distribution of par- 
ticles, and the right panel shows the configuration 
of hierarchical meshes placed in accordance with 
the above prescription. While the cells including 
the sheet structure are refined once, more levels of 
hierarchical meshes are placed in the regions where 
massive halos reside. 

2.3. Individual time steps 

We adopted a time stepping scheme called the 
hierarchical time steps (e.g. Makino 1991). The 
time step of each particle is not the same but re- 
stricted to 2~ l times of the longest time steps. 
We chose i to be the refinement level, L — L B , 
(Kravtsov et al. 1998) so that the trajectory of 
the level L particle, which is included in the level 
L refined cell but not in any level L + 1 refined 
cells, is integrated by the time interval AtL which 
is related to the time interval for the base level Lb 
particles, AtL B , as 

At L = 2 LB - L At Ls . (14) 

In our code, AtL B is given by 

**l b = (15) 

u Umax , 



where e is a free parameter and fixed as e = 0.25 
throughout this paper. 

Equation (14) indicates that the level L par- 
ticles step twice while the level L — 1 particles 
step once. Each step of particles is split into three 
steps such as velocity step by half time interval 
(Ai^/2), position step by full time interval (AtL), 
and velocity step by half time interval (Ai^/2) 
again. For example, when the times of hierar- 
chical meshes of level Lo and larger synchronize, 
their mesh structure is modified simultaneously in 
accordance with the refinement criterion. When 
this synchronization occurs at t i} the level L par- 
ticles, for which L > L , step their velocity from 
ti — AtL /2 to ti + At /2 , where L and L' are the 
particle's levels at ti — AtL and ti, respectively. 
The force on particles is interpolated from the cor- 
ners of the level L' refined cells. Hence, even if the 
cell to which a particle belongs is removed, the par- 
ticle can step their velocity properly. As far as the 
particle's levels at ti — AtL and ti are the same, the 
integration of its trajectory is the same as that by 
the leapfrog method. The pseudo-code describing 
this flow of procedures is given in Appendix B. 

2.4. Mass assignment and force interpola- 
tion 

In our code the mass of particles is assigned to 
the corners of the neighboring cells, and the force 
on particles is interpolated from them. Among 
various assignment and interpolation schemes for 
the PM method (see e.g. Hockney & Eastwood 
1988), we adopted the cloud- in-cell (CIC) scheme: 

Pl,m,n = ^2 m p lllWL B (x - Xl, m M7) 

all particles 

ffO") = X! 9l,m, n WL B (x-Xl, m (^) 

0<l,m,n<M B 

W L (x) = A L {x) A L (y) A L {z), (19) 
and 

A L (x) = (i-^M if N< *l; (20 ) 
w \0 otherwise, v ; 

where Il b , m p and xi im , n are the mesh spacing, 
the mass of particles and the position of cell cor- 
ners, respectively. The force on each node is calcu- 
lated by two-point differencing scheme which uses 
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the difference equation for equation (5): 

9x l,m,n = (<Pl+l,m,n ~ <fil-l,m,n) j (21) 

g y l,m,n = ~7T1 (<Pl,m+l,n ~ <t>l,m-l,n), (22) 

Zal Ls 

and 

g Z l,m.n = — 7T1 (<Pl,m.n+l ~ <Pl,m,n-l)- (23) 

The CIC scheme is applied also to the hierar- 
chical meshes. The mass of particles is assigned to 
the corners of all levels of cells including the par- 
ticles, while the force onto the level L particles is 
interpolated from those on the corners of the level 
L refined cells: 

Pl{Xu) = ^2 m p ll 3 W L (x p - x n ), (24) 
peP L 

and 

9l( x p) = 9 L ( x n)W L (x p -x n ), (25) 

n£N L 

where Pl and Nl are the sets of particles in level L 
cells and nodes of level L cells, respectively. We re- 
peatedly note that the particle's level is defined as 
the level of the finest refined cell including the par- 
ticle. Thus, level L particles can reside in buffer 
cells whose level is larger than L. 

2.5. Poisson solver 

We adopt the multigrid method (Brandt 1977) 
to solve the discretized Poisson equation for the 
base mesh. Equation (6) is discretized as 

L(pl, m ,n = Pl,m,n, (26) 

where L is the discretized Laplacian operator and 
stands for 

Lb 

+ <Pi+i 

0i,m-l,n + 0/,m+l,n + 

+ 4>i ,m,n+l 

6 <j>l,m,n)i 

and, the indices I, m and n specify the position of 
nodes on the base mesh. Since we do not know the 



exact solution of this equation initially, we try to 
estimate it iteratively. The error (Sip) is defined 
as the difference between the solution (<p) and the 
estimate (ip): 

5ip = (p-ip. (27) 

Roughly, Sip can be split into the short and long 
wavelength modes. The short wavelength modes 
converge quickly by the basic stationary itera- 
tive methods, such as the red-black Gauss-Seidel 
method which updates ip by solving equation (26) 
about (pi.j^k and substituting ip into (p: 

1pl,m,n = g (lpl-l,m,n + 1pl+l,m,n + 
1pl,m-l,n + 1pl,m+l,n + 
1pl,m,n-l + V'/.m.n+l - 

^Gl\ B aT l pi, m ,n)- 

This updating of estimate is not carried out all at 
once, but through two steps which do not update 
the estimate on a node and its six neighboring 
nodes simultaneously. For example, ip2i,2m,2m 

1p2l+l,2m,2n+l and 1p2l,2m+l,2n+l 

are updated in the first step, then ip2i+i,2m,2n, 

1p2l,2m+l,2n, 1p2l,2m,2n+l and 1p2l+l ,2m+l ,2n+l i n 

the second. 

On the other hand, the long wavelength modes 
converge slowly. This drawback of the iterative 
methods is overcome in the multigrid method as 
follows: First, the Poisson equation is rewritten as 

LcP = L(iP + SiP) = p, (28) 
LSip = p-LiP = 5p, (29) 

where Sp is called the residual. Note that equa- 
tion (29) relating the error to the residual keeps 
the same form as the original Poisson equation. 
Next, we prepare meshes for each level from to 
Lb- The residual on the base mesh is projected 
onto the level Lb — 1 mesh by the fine-to-coarse 
operator (see Appendix C), then the error on the 
base mesh is relaxed using the red-black Gauss- 
Seidel method. In the two-grid method, the error 
on the level Lb — 1 mesh is projected back to the 
base level and added to the estimate. This pro- 
cedure proceeds further in the multigrid method 
by estimating the error on the level Lb — 1 mesh 



using the level Lb — 2 mesh, and the error on the 
level Lb — 2 mesh using the level Lb — 3 mesh and 
so on. 

In the multigrid method described so far, it- 
eration begins from the base mesh, but in the 
full multigrid method, it begins from the level 
mesh where the solution is given trivially. Figure 
5 schematically shows the schedule of a sequence 
of relaxation, coarse-to-fine and fine-to-coarse op- 
erations. Since the iteration from level I to 
then from to L is V-shaped, this multigrid it- 
eration is called a V-cycle. Further description of 
the multigrid algorithm and the terminology used 
here is found in e.g. Briggs (1987) and Press et al. 
(1992). 

In our code, the potential on the hierarchical 
meshes is solved as follows: First, the full multi- 
grid method is applied to solve the Poisson equa- 
tion on the base mesh. Then the projection of the 
potential on the level L — 1 meshes onto the level 
L meshes and the two grid iteration using the level 
L — 1 and L meshes are executed from the level 
Lb + 1 meshes to the Lb meshes. 

As the particles belonging to the hierarchical 
cells change the position, the potential on hierar- 
chical meshes must be modified. After the level 
L particles step their position at time t + [2n + 
1)2 Lb - l M Lb , provided < (2n + 1)2 Lb ~ l < 1, 
two grid iteration is successively applied from the 
level L — 1 meshes to the level Lb meshes. Fig- 
ure 6 exhibits this AMR Poisson solver's schedule 
for the coarse-to-fine and fine-to-coarse operations 
and relaxation in the case oi Lb = 2 and Lb = 4. 

The boundary values of the hierarchical meshes 
are defined on nodes in buffer cells (Fig. 7). The 
potential at the boundary is calculated by the 
coarse-to-fine operator from the lower level follow- 
ing the full weighting scheme. Since the potential 
on the level L — 1 meshes reflect the density dis- 
tribution on the level L meshes after the two grid 
iteration among them, the potential at the bound- 
ary is corrected from the initial guess which is pro- 
jected from the lower level meshes. 

We first wrote a code using the nested iteration 
which applies the coarse-to-fine projection and the 
one level red-black Gauss-Seidel iteration instead 
of the two grid iteration. But a slight discrepancy 
was observed between the correlation functions in 
the LCDM model calculated by the nested itera- 



tion code with HTS and the code without HTS, 
though the difference is negligible between the re- 
sults calculated by the two grid iteration code with 
and without HTS (§3.4). Hence, it is crucial to 
adopt the two grid or the multi grid potential 
solver to implement the HTS technique. 

3. Tests of the code 
3.1. Force accuracy 

As the first test, we have checked whether 
the force between particles is calculated correctly. 
First, we placed a truncated singular isothermal 
sphere. Following the mesh refinement in accor- 
dance with the way described in §2.2, the force 
on the particles is calculated. Then we placed a 
fiducial particle in a level Lb cell randomly and 
recalculate the force on all particles. Subtract- 
ing the previously calculated force from this force, 
we obtained the gravitational force from the fidu- 
cial particle on the rest of particles. We repeated 
the above process 32 times. In figure 8 the force 
from the fiducial particle to the rest of particles 
is plotted against distance between them. Circles, 
squares, and crosses represent the cases of Lb= 5, 
7, and 9, respectively, while the level of the base 
mesh is fixed as Lb=5. In this test, the spacing 
of the base mesh, the mass of particles, and the 
gravitational constant are normalized to unity. Al- 
though the force is softened when the separation 
is much smaller than the spacing of the level L b 
mesh in each case, the force coincides with that 
expected from the Newtonian law at radius suffi- 
ciently larger than this spacing but smaller than 
the box size. 

We also put a homogeneous sphere consists of 
massless particles added to a massive particle at 
the center. Setting N r ^ n = 1 so that all particles 
belong to the level Lb cells, we calculated the force 
on the massless particles from the central particle. 
Since our code adopts the periodic boundary con- 
dition, we compared this force not with the Newto- 
nian law, but with the sum of the Newtonian force 
from periodically placed particles which is calcu- 
lated efficiently using the Ewald expansion (Ewald 
1921; Sangster & Dixon 1976). Figure 9 shows the 
relative error of the force calculated by our code 
in comparison with that calculated by the Ewald 
expansion. Circles, squares, and crosses represent 
the cases of Lb=Q, 8, and 10, respectively, with 
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Lb = 6 in common. In any cases, the error is 
largest at r ~ 21l d but is kept within 20%. 

3.2. Single plane wave 

Zcldovich (1970) formulated the liner approxi- 
mation for particle trajectories: 

x = q + b{t)p{q), (30) 

where q is the Lagrangian position of the particle 
and b(t) is the growing mode of the linear den- 
sity perturbation normalized so that p(q) gives 
the initial positional perturbation. This Zcldovich 
approximation is not only applicable to particles 
in the sub-linear regime, but also gives their ex- 
act trajectories for a system of one-dimensional 
sheets until any pairs of sheets crosses each other. 
Connecting the trajectories before and after sheet 
crossings, the trajectories of sheets are calculated 
with high accuracy. We test our code by com- 
paring the trajectories calculated by our code and 
those calculated by Yano's one-dimensional code 
for which the details are described in Yano & 
Gouda (1998). 

Initially we have imposed a single sinusoidal 
plane wave perturbation to the Einstein-de Sitter 
(EdS) universe: 

p(q) = Asm(kq), (31) 

k = ^, (32) 

and 

A = 1Q- 2 1 Lb , (33) 

where Iq and Il b are the size of simulation box 
and the base mesh spacing, respectively. In this 
test, the level of the base mesh is fixed as = 5. 

The following equations for position, velocity, 
and density hold until the first caustic appears: 

x = q + a(t)Asin(kq), (34) 

v = a(t)Asin(kq), (35) 

and 

> - Ktr ,36 » 

= p{\ + a(t)Akcos(kq))- 1 , (37) 



where the scale factor a(t) is normalized to unity 
at the initial time. When p diverges to infin- 
ity, the first caustics form. This occurs at q = 
Trk" 1 = l /2 when a(t) = Oi = (Ah)' 1 . We 
run the simulations setting A = 0.01 with sheets 
consisting of L B = 25 particles arranged on the 
grid. Figure 10 shows the distribution of sheets in 
the phase space at a(t) — a\ taken from the four 
runs using the one-dimensional code (solid line), 
the PM code {Lb = 5, crosses), our code with 
HTS {Ljj — 7, circles), and our code without HTS 
{Lb = 7, squares). Although the cells bordering 
on the caustics are refined only once, the velocity 
degradation for Lb — 7 is weaker than that for 
L D = 5. 

The difference between the Lb =5 and 7 re- 
sults is more prominent when the second caustic 
appears at a(t) = a 2 — 2.34 a\ (Fig. 11). Near 
the center of the spiral pattern in the phase space, 
the Lb = 5 result is wound weaker than the other 
two Lb = 7 results. No obvious difference be- 
tween the Lb = 7 results with HTS and without 
it shows that our implementation of the HTS is 
very successful. 58 base level time steps are used 
from the beginning of the simulation to the sec- 
ond caustics generation. Note that the positions 
of the first caustics in the three runs are the same. 
This is because sheets in low resolution run are 
pulled toward the center more weakly than those 
in high resolution runs, not only when they fall to- 
ward the center, but also when they leave it in the 
same way. Thus, the position of caustics does not 
change much by the resolution of the code. This 
is the reason why sheets' position in the phase 
space is different among low resolution run and 
high resolution runs, even though the position of 
the caustics are the same. This is true not only 
for the plane wave test, but also for the spherical 
infall test described in the next. 

3.3. Spherical self-similar infall 

As the third test, our code is checked against 
the spherical self-similar infall model where the 
infall of collisionless material onto the collapsed 
density perturbation in the EdS universe is con- 
sidered. This model is solved semi-analytically 
(Fillmore & Goldreich 1984; Bertschinger 1985). 
As its name indicates, this infall model possesses 
the symmetry which is different from the planar 
symmetry of the base mesh in our code and the 
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single plane wave test. Moreover, the infall onto 
the virialized objects is thought to be common in 
the hierarchical clustering. For these reasons, this 
model is widely used as a test for cosmological sim- 
ulation codes (e.g. Splinter 1996; Kravtsov et al. 
1997; Gelato et al. 1997; Yahagi, Mori, & Yoshii 
1999). 

In order to prepare the initial particle distri- 
bution for this test problem, we place AN par- 
ticles at the center of particles distributed homo- 
geneously and amorphously. Such a distribution 
is obtained as follows (White 1993): First, parti- 
cles are distributed randomly. Next, the system 
is evolved in the EdS universe reversing the sign 
of the force to make particles repulsive each other, 
and is expanded million times from the initial time 
until the kinetic energy of the system converges to 
be negligibly small. 

In this test, we set Lb = 7, Ln = 10, AN = 
64, and use 2 3Lb = 128 3 particles. We normalize 
the time by the initial age of the universe, r = 
3Hit/2, and the radius by the turn-around radius, 
r ta , given by 



3^V 8/ V3 AN^ 1/3 



4?r N 



(38) 
kr, (39) 



where N, Si and Ri are the total number of par- 
ticles, the initial density contrast and its radial 
extension, respectively. Figure 12 shows the ra- 
dial density profile calculated by our code (cir- 
cles). Crosses connected by the solid line denote 
the semi-analytic solution taken from Table 4 of 
Bertschinger (1985). Our result shows good co- 
incidence with the semi-analytic solution includ- 
ing the position of the first caustic and the power 
law asymptote of density profile toward the center 
whose index is —9/4. 

3.4. Halos in the CDM universe 

Finally, we have performed three LCDM simu- 
lations, using 64 3 particles and Lb = 6 in com- 
mon. We adopt Lb = 6 for the PM run and 
L D = 10 for the AMR+HTS and AMR runs. The 
difference between the AMR+HTS and AMR runs 
is only that the AMR+HTS run adopts the HTS 
while the AMR run docs not. For these simula- 
tions, we adopt fig = 0.3, Aq = 0.7, h = 0.7 and 



<J8 = 1.0. The size of simulation box is 70ft -1 
Mpc and the mass of particles is 1.5567 x 1O 11 M . 
The initial data of the simulations are generated 
by the COSMICS code provided by Bertschinger 
& Bode using the analytic fitting function for the 
power spectrum of the CDM universe described in 
Bardeen et al. (1986). 

Figure 13 shows the map of projected density 
in the logarithmic unit at z = 0. Filtering is not 
applied to the density maps in any panels. The 
overall mass distributions from the AMR+HTS 
and AMR runs agree well with each other, and 
are more centrally concentrated than that from 
the PM run. This is confirmed by the compari- 
son among the particle distributions in a slice of 
1/16 thickness of the side (Fig. 14), and the two- 
point correlation functions from the AMR+HTS, 
AMR, and PM runs (Fig. 15). There are many 
cosmological N-hody simulations with parameters 
similar to our LCDM simulations. Figure 16 shows 
the correlation function from our AMR+HTS run 
(thick dotted line) overlaid with those from the PM 
run in Klypin et al. (1996) (crosses, hereafter 
KPM run), AP 3 M run in Jenkins et al. (1998) 
(dashed line), and ART run in Colin et al. (1999) 
(solid line). Data of overlaid three runs are taken 
from figure 4 of Colin et al. (1999). Vertical 
solid lines shown with e AMR+H Ts, £pm, £ A p 3 m rep- 
resent the force resolution of our AMR+HTS run, 
KPM run, and the AP 3 M run in Jenkins et al. 
(1998), respectively. Among the three LCDM sim- 
ulation runs, the size of the simulation box and 
the force resolution of the KPM run is closest 
to those of our AMR+HTS run. The root mean 
square amplitude of the fluctuation at 8 h _1 Mpc 
(as) of our run is 1.0 while the KPM run adopted 
as = 1.1. Although the number of particles used 
in the AMR+HTS run is 1 /64 of particles in the 
KPM run, magnitude of the correlation function 
at r = 2e 



AMR+HTS 



of the AMR+HTS run coincides 
with that of the KPM run within 10%. 

We have also compared the density profiles 
of the halos in the AMR+HTS and AMR runs. 
Figure 17 shows the density profiles of halos 
in the AMR+HTS run (circles) and the AMR 
run (squares). The universal profile proposed by 
Navarro, Frcnk, & White (1996), 



p(r) 



(r/r s )(l+r/r s ) 2 



(40) 
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is fitted to the halos in the AMR+HTS run, and 
the result is denoted by the solid line in each panel 
of Figure 17. This universal profile provides a good 
fit to the dark halos over the entire range of radius. 
The profiles of the halos from the AMR+HTS run 
are in agreement with those from the AMR run. 
Slight discrepancies between them are primarily 
due to the different positions of satellite halos in 
these two runs. This agreement of halo profiles be- 
tween the AMR+HTS and AMR runs also indicate 
that our implementation of the HTS is successful. 

Finally we compare the performance of the 
AMR+HTS and AMR codes on a PC with an 
AMD's Athlon 750 MHz processor. The LCDM 
simulations for which we set Lb — Lb = 4 are 
performed from a=0.0366 to a=l. Thereby the 
AMR+HTS code uses 133 base-level time steps, 
while the AMR code uses 1932 global time steps. 
The CPU time spent by these codes is plotted 
against a in figure 15. As seen from this figure, 
the AMR+HTS code spends less than a half of 
the CPU time spent by the AMR code. Thus 
the AMR+HTS code saves much of the CPU 
time, while keeping the same spatial resolution 
as the AMR code. Higher performance of the 
AMR+HTS code is achieved when we set a higher 
Ld — Lb- 

4. Summary 



their environments (see. Appendix A). Sorting 
the hierarchical cells by the Morton ordering is a 
preparatory step for the parallelization. 

Finally it is worth mentioning that we have al- 
ready combined our TV-body code with the hydro- 
dynamics code (Yahagi & Yoshii 2000). This com- 
bined code enables us to study the dynamical evo- 
lution of both dark matter and baryonic compo- 
nents. Including some physical and phenomeno- 
logical processes such as cooling of the gas, star 
formation, energy feedback from supernovae etc, 
we will trace the formation process of galaxies un- 
der the realistic condition in the universe. 

We would like to thank Shu-ichiro Inutsuka 
for his useful comments on the Poisson solver of 
our code and Taihei Yano for providing us the 
data for the single plane wave test using his one- 
dimensional code for collisionless systems. Some 
calculations were conducted using the resources of 
the the Astronomical Data Analysis Center of the 
National Astronomical Observatory, Japan. H.Y. 
acknowledges the JSPS Research Fellowships for 
Young Scientists. This work has been supported 
in part by the Grant-in-Aid for COE research 
(07CE2002). 



We have described two techniques of AMR and 
HTS which are able to increase the spatial and 
time resolutions of the PM method. The AMR 
divides and removes cells in each time step in 
compliance with the refinement criterion, and en- 
hances the spatial resolution. On the other hand, 
the HTS changes the time step interval depend- 
ing on the level of particles and enhances the time 
resolution. Three different tests described in §3 
show that our implementation of AMR and HTS 
is successful. Compared with the previous codes 
with AMR, we describe the HTS technique in de- 
tail. It is found that the HTS is an indispensable 
technique for the wide dynamic range simulation 
because importance of the HTS increases as the 
finer cells are introduced into the simulation. 

We will parallelize our code for massively par- 
allel processors to realize simulations with more 
than 10 9 particles which are needed to study 
the links between the formation of galaxies and 
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A. Crude estimation for the number of particles required to resolve halos of field galaxies 



The relaxation time scale for a halo is given by (see e.g. Binney & Tremaine 1987) 

0.1N 

^relax rs "' ~J J^~t cross (Al) 

where N is the number of particles which compose the halo. For a halo formed at z — z co u in the Einstein-de 
Sitter (EdS) universe, we have 

trelax ~ ^ (178 Gp (1 + Z^f)- 1 ' 2 . (A3) 

In order to calculate the evolution of the halo correctly the relaxation time scale must exceed the age of the 
halo at z = 0, i.e. 

trelax ^ ^age (-^-4) 

> \h^\1-{1 + z cM ?/ 2 ). (A5) 

The required minimum number of particles, N re i ax , can be estimated crudely by solving the following non- 
linear equality: 

trelax {N re l ax ^ — tage- (-^-6) 

Thus the total number of particles, required to keep simulated galactic halos unrelaxed, is given by 

N tot ai > ^^N relax (A7) 

Mhalo 



3 

2 ; 



a " 8 (i5§ife)~ (ife)' < A8 » 

- 10 9 , (L ~ 32Mpc, z co a ~ 3, EdS) (A9) 

where Mhalo is the minimum mass of halos which do not relax until the simulation is completed, and L 
is the size of simulated region. For the reference values of Mhalo = 1O 11 M0,L = 32Mpc, z co u = 3, and 
(£lo,h) = (1,0.5), we have N to tai ^ 10 9 for the EdS universe. 

The above estimate is based on many assumptions. For example, halos are assumed to have been isolated 
after they formed, although there are cluster galaxies which interact with the cluster and other galaxies, and 
these interactions can increase the required number of particles. Hence, this estimation is quite naive and 
optimistic, but it gives the lower limit of the number of particles necessary to reproduce un-relaxed galactic 
haloes in simulations. 



B. Pseudo-code 

base_step 
{ 

int L; 

timestepsO ; 

velocity -half step (L% <L<L D ) ; 

if (L B < L D ) hierarchical step (Lg + 1) ; 
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position step (\^, <L<Lp) ; 
meshjmodification(L-Q + 1 <L<Lp) ; 
potential solver (Lg <L<Lq) ; 
velocity -half step (Lg <L<Lp) ; 

} 

hierarchical-step (int Lo) 
{ 

int L; 

if (Lo < Lp) hierarchical step {Lq + 1) ; 

positionstep (L <L<Lrj) ; 

meshjmodi fication (Lo + 1 <L<Lq) ; 

potential solver (Lq <L<Lq) ; 

velocity -full step (Lo <L<Lp) ; 

if (Lo < Lp) hierarchical step (L + 1) ; 

} 



C. Coarse-to-fine and fine-to-coarse operators 

The full-weighting scheme for the fine-to-coarse and the coarse-to-fine operators are adopted as 



L-l 

m,n 



A.L + 1 

^ 1 2i,2m,2n 

A L + 1 
A L + 1 
A L + 1 

^ 1 2i,2m,2n+l 
A.L + 1 

A L + 1 

A L + 1 



16 
J_ 

32 



A: 



2/.2m,2n 



(^2/-l,2m,2n 
(■^2i-l,2m-l,2n 
^2;-l,2m,27i-l "t 

A" 



^2i+l,2m,2n + ^2i,2m-l,2n + ^2/,2m+l,2n 



^ 1 2i,2m,2n- 



1 + ^2i,2m,2n+l 



) + 



+ A. 



2Z,2m-1.2n-l 



1 2/+l,2m-l,2n 
L 

2l+l,2m,2n-l 
A L 

/1 2Z,2m+l,2n-l 



2/-l,2m+l,2n 



+ A 



2J+l,2m+l,2n 



+ 



^2i-l,2ro,2ra+l + ^2i+l,2m,2ri+l + 



/1 2;,2m-l,2n+l 



-^2l,2m+l,2n+l) + 



,4 



2Z-l,2m-l,2ra-l 
-^■2i-l,2m-l,2n+l ^ 



4- 4 L 

r /1 2; + l,2m-l,2n-l 
' ^2/+l,2m-l,2n+l ^ 



I" ^2i-l,2ro+l,2ra-l 
^2Z-l,2ro+l,2n+l ^ 



4 



,m,n 



A+l,m,n) 



(A,m,n + ^+l,ro,n + ^i,ro+l,rt + A+l,m+l,n)' 
|£ i aL aL i iL 



{-^l,m,n + A+l,ra,n + ^l,m,n+l + A+l,m,n+l)> 



^ ^2J+l,2m+l,2n-l 
^-2/+l,2m+l,2n+l)! 



and 



1 4- 4 L 4- 4 L 4- A L 4- 

^ 1 2i+l,2m+l,2n+l — 8 ^/,m,rc ^ ^ 1 /+l : m,n ' ^,m+l,n ' ^^m^+l ^ 

A+l,m+l,n + A + l,m,n+l + A,m+l,n+l + A+l,m+l,n+l)- 

Here, Af- k represents any variables defined on the node which is placed at (x,y,z) — (i Il,J lh,k II) on 
the level L mesh. 
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Fig. 1. — Data structure of the building block 
which is used to construct the hierarchical meshes. 
Eight half-sized cells having the same parent are 
grouped into a cell octet and share the pointer to 
the parent and its six neighbors. 
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■ -OOO OO CXXXKXXXXXXXXXX) — oo oo 



L B +3 
Lb+2 
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Un-refined 
cell 



Buffer 
cells 



Refined 
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Fig. 3. — Three types of the hierarchical meshes 
such as unrefined, refined and buffer cells. All cells 
which satisfy the refinement criterion are subdi- 
vided and called refined cells (shaded). Those cells 
which are next to the refined cells are also subdi- 
vided and called buffer cells. The others remain 
as unrefined cells. 



Fig. 2. — Construction of the hierarchical meshes 
by connecting cell octets as shown. These cell 
octets are added or removed dynamically as the 
system evolves. The cells without the child cell 
octet use their pointer to the child octet as the 
pointer to the head of the linked listed particles 
which reside in the cell. 



Fig. 4. — An example of the hierarchical mesh 
distribution in the iV-body code with AMR for 
the case of the LCDM universe described in §3.4. 
When particles are distributed as shown in the 
left panel, hierarchical meshes are placed as shown 
in the right panel. The shape of the hierarchical 
meshes is not geometrically restricted. 
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Fig. 5. — Schematic illustration of the full multi- 
grid method. The full multigrid method acceler- 
ates the relaxation process by reducing the long 
wavelength mode of the residual using the coarser 
meshes. The full multigrid iteration begins from 
the coarsest mesh, i.e., the whole simulation box. 
Potential on the level Lq mesh is solved using level 
L meshes where < L < Lq. 



Fig. 8. — Pairwise force in the AMR code. Circles, 
squares, and crosses are for Lb =5, 7, 9, respec- 
tively, with Lb = 5 in common. The size of the 
base mesh, the mass of particle, and the gravita- 
tional constant are normalized to unity. The solid 
line shows the exact Newtonian force of g — r~ 2 . 
The vertical lines at the bottom indicate the size 
of the smallest cell for three cases. 



Fig. 6. — Schedule of the Poisson solver in our 
AMR code for the case oi Lb = 2 and Lb = 4. 
At time t, we execute the full multigrid iteration 
first. Then we apply the two grid iteration from 
Lb + 1 to Lb ■ When the level L particles step their 
position, the two grid iteration is applied from L — 
1 to Lb- 



Fig. 9. — Error of the pairwise force calculated 
by the AMR code. Circles, squares, and crosses 
are for Lb =6, 8, 10, respectively, with Lb = 6 
in common. Error is defined as the relative er- 
ror of the calculated force using the AMR code 
in comparison with that calculated by the Ewald 
expansion. In all cases, the error is maximized at 
r ~ 21l d , but is kept within 20%. 
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Fig. 7. — Location of inner nodes (circles) and 
outer nodes (crosses) on which the potential and 
the boundary condition are defined respectively. 
The inner nodes are placed on the corners of re- 
fined cells, and the outer nodes in buffer cells. 
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Fig. 10. — Snapshot of the phase diagram from the 
single plane wave test at the epoch of first caustic 
generation. The solid line shows the exact solu- 
tion calculated by the one-dimensional code with 
1024 sheets using the code described in Yano & 
Gouda (1998). Crosses, circles, and squares show 
the results obtained by the code without AMR, 
with AMR and HTS, and with AMR but without 
HTS, respectively. Because the meshes are refined 
only once at this epoch, the difference between the 
AMR and non-AMR results is minor. We however 
note that this difference becomes larger as time 
proceeds beyond the first caustic generation (see 
Fig. 11). 
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Fig. 11. — Snapshot of the phase diagram from 
the single plane wave test at the epoch of second 
caustic generation. The same as figure 10, but 
for <Z2 — 2.34 ai where a% and ai are the scale 
factors at the epoch of first and second caustics 
generation, respectively. Although blunt in the 
non-AMR run, the second caustics are well cap- 
tured in the AMR result, irrespective of whether 
the HTS is included or not. 




Fig. 12. — Density profile of the spherical self- 
similar infall model. Crosses connected by the 
solid line show the semi-analytic solution provided 
in Table 4 of Bertschinger (1985). The circles show 
the result by our code, where radius and density 
are normalized by the turn around radius and the 
mean density, respectively. Our result shows good 
agreement with the semi-analytic solution includ- 
ing the first caustic. 



Fig. 13. — Map of projected density in the loga- 
rithmic unit at z = for the LCDM universe with 
n Q = 0.3, A = 0.7, h = 0.7 and a s = 1.0. The size 
of simulation box is 70h~ l Mpc, and the number of 
particles is 64 3 and the level of the base mesh, Lb , 
= 6. Three panels are taken from (a) AMR run 
with L D = 10, (b) AMR+HTS run with L D = 10, 
and (c) PM run with Lb = Lb = 6. The overall 
mass distributions of (a) and (b) agree well with 
each other, and their halos are bound more tightly 
than those in (c). 

Fig. 14. — The same as figure 13. Distributions 
of particles in the LCDM simulation are shown 
in this figure. Only particles in a slice of 1/16 
thickness of the simulation box side are shown. 
This figure confirms the trends described in the 
caption of figure 13. 
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Fig. 15. — Two-point correlation function at z = 
for the LCDM universe with Qq = 0.3, Ao = 
0.7, h = 0.7 and erg — 1.0, based on the same sim- 
ulations as in figure 13. Shown are the results 
from the AMR+HTS run (solid line), the AMR 
run (dotted line), and the PM run (broken line). 
Note that the AMR+HTS and AMR runs give al- 
most identical results, so that the difference is not 
visible in this figure. The vertical lines indicate 
the minimum size of the cell, or the force reso- 
lution of the codes. Because of the low spatial 
resolution the correlation function at small sep- 
arations in the result of the PM run is strongly 
weakened compared with the other two runs. 
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Fig. 16. — Comparison of the correlation function 
of LCDM simulations. Shown are our AMR+HTS 
run (thick dotted line), the PM run in Klypin et al. 
(1996) (crosses), the AP 3 M run in Jenkins et al. 
(1998) (dashed line), and the ART run in Colin et 
al. (1999) (solid line). Vertical solid lines shown 
with Eamr+hts; Epm, e AP 3 M represent the force res- 
olution of our AMR+HTS run, KPM run, and the 
AP 3 M run in Jenkins et al. (1998), respectively. 



Fig. 17. — Density profiles of halos detected at 
z = for the LCDM universe with Q — 0.3, A = 
0.7, h — 0.7 and er 8 = 1.0, based on the same simu- 
lations as in figure 13. Shown are the results from 
the AMR+HTS (circles) and AMR (squares) runs. 
The solid lines show the fits to the halos in the 
result of the AMR+HTS run using the universal 
density profile (Eq. 40) . 




Fig. 18.— CPU time spent by the AMR+HTS run 
(dotted line) and the AMR run (dashed line) on a 
PC with an AMD's Athlon 750Hz processor. The 
AMR+HTS run spends only a half of the CPU 
time spent by the AMR run. 
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